17 research outputs found

    From social media analysis to ubiquitous event monitoring: The case of Turkish tweets

    No full text
    9th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2017 : Sydney; Australia)The work described in this paper illustrates how social media is a valuable source of data which may be processed for informative knowledge discovery which may help in better decision making. We concentrate on Twitter as the source for the data to be processed. In particular, we extracted and captured tweets written in Turkish. We analyzed tweets online and real-time to determine most recent trending events, their location and time. The outcome may help predicting next hot events to be broadcasted in the news. It may also raise alert and warn people related to upcoming or ongoing disaster or an event which should be avoided, e.g., traffic jam, terror attacks, earthquake, flood, storm, fire, etc. To achieve this, a tweet may be labeled with more than one event. Named entity recognition combined with multinomial naive Bayes and stochastic gradient descent have been integrated in the process. The reported 95% success rate demonstrate the applicability and effectiveness of the proposed approach. © 2017 Association for Computing Machinery.ACM SIGMOD,Gemalto,IEEE Computer Society,IEEE TCDE,Springer Natur

    Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data

    No full text
    Clustering is an essential research problem which has received considerable attention in the research community for decades. It is a challenge because there is no unique solution that fits all problems and satisfies all applications. We target to get the most appropriate clustering solution for a given application domain. In other words, clustering algorithms in general need prior specification of the number of clusters, and this is hard even for domain experts to estimate especially in a dynamic environment where the data changes and/or become available incrementally. In this paper, we described and analyze the effectiveness of a robust clustering algorithm which integrates multi-objective genetic algorithm into a framework capable of producing alternative clustering solutions; it is called Multi-objective K-Means Genetic Algorithm (MOKGA). We investigate its application for clustering a variety of datasets, including microarray gene expression data. The reported results are promising. Though we concentrate on gene expression and mostly cancer data, the proposed approach is general enough and works equally to cluster other datasets as demonstrated by the two datasets Iris and Ruspini. After running MOKGA, a pareto-optimal front is obtained, and gives the optimal number of clusters as a solution set. The achieved clustering results are then analyzed and validated under several cluster validity techniques proposed in the literature. As a result, the optimal clusters are ranked for each validity index. We apply majority voting to decide on the most appropriate set of validity indexes applicable to every tested dataset. The proposed clustering approach is tested by conducting experiments using seven well cited benchmark data sets. The obtained results are compared with those reported in the literature to demonstrate the applicability and effectiveness of the proposed approach. (C) 2013 Elsevier B.V. All rights reserved

    Utilizing maximal frequent itemsets and social network analysis for HIV data analysis

    Get PDF
    Acquired immune deficiency syndrome is a deadly disease which is caused by human immunodeficiency virus (HIV). This virus attacks patients immune system and effects its ability to fight against diseases. Developing effective medicine requires understanding the life cycle and replication ability of the virus. HIV-1 protease enzyme is used to cleave an octamer peptide into peptides which are used to create proteins by the virus. In this paper, a novel feature extraction method is proposed for understanding important patterns in octamer's cleavability. This feature extraction method is based on data mining techniques which are used to find important relations inside a dataset by comprehensively analyzing the given data. As demonstrated in this paper, using the extracted information in the classification process yields important results which may be taken into consideration when developing a new medicine. We have used 746 and 1625, Impens and schilling data instances from the 746-dataset. Besides, we have performed social network analysis as a complementary alternative method

    Fuzzy region connection calculus and its application in fuzzy spatial skyline queries

    No full text
    Spatial data plays a pivotal role in decision-making applications in a way that nowadays we witness its ever-growing and unprecedented use in both analyses and decision-making. In between, spatial relations constitute a significant form of human understanding of spatial formation. Regarding this, the relationships between spatial objects, particularly topological relations, have recently received considerable attention. However, real-world spatial regions such as lakes or forests have no exact boundaries and are considered fuzzy. Therefore, defining fuzzy relationships between them would yield better results. So far, several types of research have addressed this issue, and remarkable advances have been achieved. In this paper, we propose a novel method to model the “Part” relation of fuzzy region connection calculus (RCC) relations. Furthermore, a method based on fuzzy RCC relations for fuzzification of an important group of spatial queries, namely the skyline operator, is proposed in spatial databases that can be used in decision support, data visualization, and spatial databases applications. The proposed algorithms have been implemented and evaluated on real-world spatial datasets. The results of the carried out evaluation demonstrate more flexibility in comparison with other well-established existing methods, as well as the appropriateness of the speed and quality of the results

    Estimating the Importance of Terrorists in a Terror Network

    No full text
    noWhile criminals may start their activities at individual level, the same is in general not true for terrorists who are mostly organized in well established networks. The effectiveness of a terror network could be realized by watching many factors, including the volume of activities accomplished by its members, the capabilities of its members to hide, and the ability of the network to grow and to maintain its influence even after the loss of some members, even leaders. Social network analysis, data mining and machine learning techniques could play important role in measuring the effectiveness of a network in general and in particular a terror network in support of the work presented in this chapter. We present a framework that employs clustering, frequent pattern mining and some social network analysis measures to determine the effectiveness of a network. The clustering and frequent pattern mining techniques start with the adjacency matrix of the network. For clustering, we utilize entries in the table by considering each row as an object and each column as a feature. Thus features of a network member are his/her direct neighbors. We maintain the weight of links in case of weighted network links. For frequent pattern mining, we consider each row of the adjacency matrix as a transaction and each column as an item. Further, we map entries into a 0/1 scale such that every entry whose value is greater than zero is assigned the value one; entries keep the value zero otherwise. This way we can apply frequent pattern mining algorithms to determine the most influential members in a network as well as the effect of removing some members or even links between members of a network. We also investigate the effect of adding some links between members. The target is to study how the various members in the network change role as the network evolves. This is measured by applying some social network analysis measures on the network at each stage during the development. We report some interesting results related to two benchmark networks: the first is 9/11 and the second is Madrid bombing

    Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends

    Get PDF
    Background: Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer. Results: We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries. Conclusions: Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions. © 2016 Jurca et al
    corecore